Efficiency Improvements for Parallel Subgraph Miners

نویسندگان

  • Abhik Ray
  • Lawrence B. Holder
چکیده

Algorithms for finding frequent and/or interesting subgraphs in a single large graph scenario are computationally intensive because of the graph isomorphism and the subgraph isomorphism problem. These problems are compounded by the size of most real-world datasets which have sizes in the order of 10 or 10. The SUBDUE algorithm developed by Cook and Holder finds the most compressing subgraph in a large graph. In order to perform the same task on real-world data sets efficiently, Cook et al. developed a parallel approach to SUBDUE called the SP-SUBDUE based on the MPI framework. This paper extends the work done by Cook et al. to improve the efficiency of MPI SUBDUE by modifying the evaluation phase. Our experiments show an improvement in speed-up while retaining the quality of the results of serial SUBDUE. The techniques that we have used in this study can also be used in similar algorithms which use static partitioning of the data and re-evaluation of locally interesting patterns over all the nodes of the cluster.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Frequent Subgraph Miners: Runtimes Don’t Say Everything

In recent years several frequent subgraph miners were proposed. The authors of these new algorithms typically compared the runtimes of their implementations with those of previous implementations to confirm the efficiency of their methods. To get a better perspective on the mutual benefits of the algorithms, Wörlein et al. [9] performed an experimental evaluation of re-implementations of severa...

متن کامل

The ParMol Package for Frequent Subgraph Mining

Mining for frequent subgraphs in a graph database has become a popular topic in the last years. Algorithms to solve this problem are used in chemoinformatics to find common molecular fragments in a database of molecules represented as two-dimensional graphs. However, the search process in arbitrary graph structures includes costly graph and subgraph isomorphism tests. In our ParMol package we h...

متن کامل

Graph-Based Knowledge Discovery: Compression versus Frequency

There are two primary types of graph-based data miners: frequent subgraph and compression-based miners. With frequent subgraph miners, the most interesting substructure is the largest one (or ones) that meet the minimum support. Whereas, compression-based graph miners discover those subgraphs that maximize the amount of compression that a particular substructure provides a graph. The algorithms...

متن کامل

A new algorithm for mining frequent connected subgraphs based on adjacency matrices

Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting duplicate candidates using canonical form (CF) tests. CF tests have high computational complexity, which affects the efficiency of graph miners. In this paper, we introduce novel properties of the canonical adjacency matrices for reducing the number of CF tests in FCSM. Based on these properties, a n...

متن کامل

On Speeding up Frequent Approximate Subgraph Mining

Frequent approximate subgraph (FAS) mining has become an interesting task with wide applications in several domains of science. Most of the previous studies have been focused on reducing the search space or the number of canonical form (CF) tests. CF-tests are commonly used for duplicate detection; however, these tests affect the efficiency of mining process because they have high computational...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012